Using Hearst's Rules for the Automatic Acquisition of Hyponyms for Mining a Pharmaceutical Corpus

نویسنده

  • Michael P. Oakes
چکیده

Fully Automatic Thesaurus Generation (ATG) seeks to generate useful thesauri by mining a corpus of raw text. A number of statistical approaches, based on term co­ occurrence, exist for this, but in general they are only able to estimate the strength of the relationship between two terms, not its nature. In this paper we implement Hearst's method of discovering the hyponymy relations which are the building blocks of hierarchical thesauri. We start with the Scrip corpus of newsfeeds in the domain of psychology, and were able to discover an estimated 400 useful term relationships. A domain­specific thesaurus such as MeSH (MEDLINE) or the Derwent Drug File (DDF) gives an overview of the extent of the domain, and the categories, relations and named entities within it. They typically consist of lists of terms organised according to a semantic hierarchy. Electronic thesauri are used in document retrieval or indexing systems, for expanding queries when searching for information or the selection of a preferred form of a given search term. Experiments such as the Worm Community System have shown that the thesaurus is an excellent memory­jogging device which supports learning and serendipitous browsing. Thesauri prevent users from becoming overwhelmed by the sheer amount of available information, and the " classical vocabulary problem, which results from the diversity of expertise and backgrounds of systems users " (Chen et al., 95). Although a number of successful commercially­available thesauri created by large teams of human experts are available, in general manual thesaurus generation is prohibitively costly. Grefenstette (94) writes that the ideal might be to use knowledge­ poor approaches, starting from just the raw corpus ­ " if the ultimate goal of ATG (Automatic Thesaurus Generation) is the deduction of semantic relationships exclusively from free text corpora ". ATG is thus an example of knowledge discovery in text databases or text data mining. Most existing methods for automatic thesaurus generation are statistical, and rely on the co­occurrence of a pair of terms within a common " window " of text, which may be a fixed number of words, within the same syntactic clause, or within a common document in a large collection of documents. Details of such approaches were given first by Salton in 1989, and more recently by Pereira et al. (93) and Kageura et al. (00). For each word pair in the corpus vocabulary, such methods are able to generate a numeric score to …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ساخت نیمه‌خودکار یک پیکره از نظرات غیرمستقیم در دامنه دارو و بکارگیری آن برای تعیین قطبیت نظرات

Opinion mining is a well-known problem in natural language processing that has attracted increasing attention in recent years. Existing approaches have been often focused on identifying direct opinions and ignored indirect ones. However, in some domains such as medical, indirect opinions occur frequently. Therefore, ignoring indirect opinions can lead to the loss of valuable information and not...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Automatic Acquisition of Hyponyms and Meronyms from Question Corpora

We explore how lexical and ontological relations can be acquired automatically from natural language questions. The focus in this paper is on identifying hyponym and meronym relations by using simple pattern matching. It is shown that natural language questions can provide a significant source for ontological information.

متن کامل

Acquisition of Hypernyms and Hyponyms from the WWW

Recently research in automatic ontology construction has become a hot topic, because of the vision that ontology will be the core component to realize the semantic web. This paper presents a method to automatically construct ontology by mining the web. We introduce an algorithm to automatically acquire hypernyms and hyponyms for any given lexical term using search engine and natural language pr...

متن کامل

Explain the theoretical and practical model of automatic facade design intelligence in the process of implementing the rules and regulations of facade design and drawing

Artificial intelligence has been trying for decades to create systems with human capabilities, including human-like learning; Therefore, the purpose of this study is to discover how to use this field in the process of learning facade design, specifically learning the rules and standards and national regulations related to the design of facades of residential buildings by machine with a machine ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005